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Abstract 

Background: Much of the data resolution of the haploid non-recombining Y chromosome (NRY) haplogroup 0 in 
East Asia are still rudimentary and could be an explanatory factor for current debates on the settlement history of 
Island Southeast Asia (ISEA). Here, 81 slowly evolving markers (mostly SNPs) and 17 Y-chromosomal short tandem 
repeats were used to achieve higher level molecular resolution. Our aim is to investigate if the distribution of NRY 
DNA variation in Taiwan and ISEA is consistent with a single pre-Neolithic expansion scenario from Southeast China 
to all ISEA, or if it better fits an expansion model from Taiwan (the OOT model), or whether a more complex history 
of settlement and dispersals throughout ISEA should be envisioned. 

Results: We examined DNA samples from 1658 individuals from Vietnam, Thailand, Fujian, Taiwan (Han, plain tribes 
and 14 indigenous groups), the Philippines and Indonesia. While haplogroups 01a*-M1 19, O1a1*-P203, O1a2-M50 
and O3a2-P201 follow a decreasing dine from Taiwan towards Western Indonesia, 02a1-M95/M88, 03a*-M324, 
O3a1c-IMS-JST00261 1 and 03a2c1a-M133 decline northward from Western Indonesia towards Taiwan. Compared to 
the Taiwan plain tribe minority groups the Taiwanese Austronesian speaking groups show little genetic paternal 
contribution from Han. They are also characterized by low Y-chromosome diversity, thus testifying for fast drift in 
these populations. However, in contrast to data provided from other regions of the genome, Y-chromosome gene 
diversity in Taiwan mountain tribes significantly increases from North to South. 

Conclusion: The geographic distribution and the diversity accumulated in the 01a*-M1 19, O1a1*-P203, O1a2-M50 
and O3a2-P201 haplogroups on one hand, and in the 02a1-M95/M88, 03a*-M324, O3a1c-IMS-JST00261 1 and 
03a2c1a-M133 haplogroups on the other, support a pincer model of dispersals and gene flow from the mainland 
to the islands which likely started during the late upper Paleolithic, 18,000 to 15,000 years ago. The branches of the 
pincer contributed separately to the paternal gene pool of the Philippines and conjointly to the gene pools of 
Madagascar and the Solomon Islands. The North to South increase in diversity found for Taiwanese Austronesian 
speaking groups contrasts with observations based on mitochondrial DNA, thus hinting to a differentiated 
demographic history of men and women in these populations. 
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Background 

The Taiwanese population comprises three groups of 
inhabitants. Taiwan Han (TwHan), presently the most 
numerous group, are descendants mainly of immigrants 
who came to Taiwan -400 years ago and include Minnan 
and Hakka, as well as Han Chinese who immigrated more 
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recently from all over China. The Taiwan mountain tribes 
Aborigines (TwMtA) include 12 of the 14 officially recog- 
nized ethnic groups of the island and represent about 2% 
of its total population. Finally, a minority group generally 
described as the Taiwan plain tribe Aborigines (TwPlt) or 
Pingpu, who are believed to be heavily sinicized and mixed 
with the Taiwanese Han [1]. This last group represents 
less than 1% of the population in Taiwan. 

TwMtA, Filipinos and Indonesian people speak languages 
that belong to the Austronesian languages family [2]. Ten 
primary branches are usually recognized at the roots of this 
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linguistic family [3]. Nine of these branches are found 
exclusively in Taiwan and constitute the first-order sub- 
group of the Austronesian language family. Austronesian 
languages spoken outside Taiwan, including the Taiwan 
offshore Yami language, belong to the tenth branch, 
Malayo-Polynesian [3]. This branch comprises more 
than 1,200 separate languages that are spoken over a huge 
geographic region covering Madagascar in the Indian 
Ocean, Island Southeast Asia (ISEA) and hundreds of 
Oceanic islands all the way east towards Easter Island in 
the Pacific. Thus, because of the particular geographic 
distribution of the first-order subgroups of Austronesian, 
Taiwan is often considered as the potential original home- 
land of Austronesian speakers. 

In addition to linguistics, scholars have also examined 
evidence from archaeology and genetics to determine the 
original homeland of Austronesian speakers. The most con- 
sistent and generally accepted view based on archaeological 
evidence [4] suggests that Proto- Austronesian speakers, the 
ancestors of present-day Austronesian populations, reached 
ISEA via Taiwan, some 4,500 years ago, before the Iron 
Age, and possibly concomitant with the migration of early 
farming communities who expanded in Southeast China 
as a result of favorable climate changes [5]. 

Recent advances in molecular genetics using human 
mitochondrial DNA (mtDNA) complete genome sequen- 
cing, and genotyping of the haploid non-recombining Y 
chromosome (NRY) have generated a large amount of 
genetic data from populations in continental Asia and 
ISEA [6-14]. Using mtDNA from diverse genetic pools, 
Melton et al. [15] were first to support the idea that all 
present-day Austronesian speakers from Taiwan, ISEA 
and the Pacific region can retrace their origins with roots 
in Southeast Asia, via Taiwan (4). Later, other geneticists 
corroborated that the maternal genetic structure of the 
Austronesian speakers of Taiwan and of western ISEA 
(the Philippines and Western Indonesia) showed some re- 
markable similarities [16-19], an observation also echoed 
by linguistic evidence [20]. Such genetic results stimulated 
much debate when trying to correlate scenarios of past 
dispersal routes inferred independently by phylogeogra- 
phy, archeology and linguistics [21,22]. To reduce the 
controversy and possible contradictions between the vari- 
ous hypotheses, such as the express train and slow boat 
models to name a few [5,23], a more lenient description 
of the migration of the Austronesian peoples was later 
redefined as the "Out of Taiwan" (OOT) hypothesis [24]. 
Here the OOT identifies a unique movement of people 
leaving southeastern Taiwan more than 4,000 years ago 
and moving toward the Pacific and later to the Indian 
Ocean. Differences of opinion developed when Soares and 
colleagues [17], using mtDNA haplogroup E data, showed 
that it was also possible that people moved northward 
from ISEA toward Taiwan. This was soon supported by 



additional studies [18,25] both of which inferred a bi- 
directional genetic corridor between Taiwan and the 
Philippines, as well as others that showed possible simul- 
taneous northern and southern passages to Taiwan and to 
Indonesia [14,26,27]. 

Despite all contentions, the use of maternally transmitted 
genetic markers from mtDNA and paternally transmitted 
markers such as single nucleotide polymorphisms (Y-SNPs) 
and microsatellites (Y-STRs) of the non-recombining 
portion of the Y chromosome (NRY) have now become 
powerful tools to describe independently distinct genetic 
patterns within and between populations and to retrace 
their movements across regions. These methods have not 
only become complementary to each other, but because of 
the lack of recombination, they are also very informative 
in tracing human prehistory, temporally and spatially [6]. 

Several studies have retraced the earliest expansion 
Y-chromosome macro-haplogroup 01-M175 in East 
Asia and ISEA [28-31]. It has now been shown that many 
Y-STR lineages defined in the background of 01-M175 
and seen among TwMtA and ISEA populations derive 
their ancestry independently from Daic-speaking groups 
as a result of a pre-Neolithic demographic expansion from 
Southeast China [11,27,30]. It has also been shown that 
not all paternal lineages observed among the populations 
of ISEA trace back their origin to Taiwan, but instead that 
TwMtA and ISEA populations find a common origin in 
Southeast China [7,14,17,26,27,30,32,33]. Nevertheless, the 
phylogenetic resolution of haplogroup O chromosomes in 
East Asia populations in general and in Taiwanese samples 
in particular, still remains generally rudimentary. Here, 81 
high-definition Y-chromosome SNPs [8] and 17 microsatel- 
lites [32-34] are examined in combination for the first time 
to determine the genetic profiles of extant populations of 
Taiwan, the Philippines, Indonesia, and populations from 
the Indochinese peninsula in Mainland Southeast Asia 
(MSEA, referred herein as Indochina), namely Thailand, 
North Vietnam and the Akha. This investigation was 
undertaken with the aim of evaluating the distribution 
pattern of male-specific diversity in Taiwan, the Philippines 
and Indonesia and to see if this is consistent with a single 
pre-Neolithic expansion scenario from Southeast China to 
all ISEA, or if it better fits the OOT model, or whether 
a more complex history of settlement and dispersals 
throughout ISEA should be envisioned. 

Our results support a northeastward dispersal from 
Southeast Asia (SEA) to Taiwan and Island Southeast Asia 
and corroborate the hypothesis of separate migration routes 
to Indonesia from SEA and Indochina [14,18,25-27] (i.e.. a 
northeastward dispersal from Southeast Asia (SEA) via 
Taiwan to Island Southeast Asia, and a southward dispersal 
via the Indochinese peninsula to ISEA. We extend earlier 
observations by proposing a pincer model that includes 
a multidirectional center of dispersal in SEA, contiguous 
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northward migration routes from Indonesia to the 
Philippines and Taiwan and southward gene flow from 
Taiwan through the Philippines and Indonesia. Finally, 
the genetic structure of Taiwan plain tribe Aborigines 
may introduce new speculation on the time of their first 
settlement in Taiwan. 

Methods 

Samples 

Whole blood or saliva specimens were collected from 
1660 unrelated male individuals from 35 ethnic groups 
in ISEA, Taiwan, the east coast of China and Indochina 
(Vietnam and Thailand) (Table 1 and Figure 1). 

These samples (Figure 1) comprise Austronesian speak- 
ing groups from the Philippines (n = 146) and Indonesia 
(n = 246), and from eleven TwMtA populations (n = 355). 
Among the latter, Atayal, Truku (Taroko) and Saisiyat 
and Thao are all considered as northern Mountain Tribe 
Aborigines (n = 112), Rukai, Paiwan, Puyuma, Amis and 
the offshore Yami are all considered as southern Mountain 
Tribe Aborigines (n = 146), whereas the Thao, Tsou 
and Bunun are referred here as central mountain tribes 
Aborigines. The sampling also includes Han groups from 
Taiwan, namely TwHAN comprising Minnan (n = 60), 
Hakka (n = 34), and a group composed of miscellaneous 
Minnan individuals (n = 258) who were uncertain about 
the origin of at least one of their parents (referred herein 
as MiscHan), as well as a sample of Han from Fujian 
(n = 55), facing Taiwan on the eastern coast of China. 
Finally groups from Indochina, namely Vietnam (Hanoi, 
n = 24), Thailand (Bangkok n = 35 and Akha n = 27) and 
Malaysia (n = 8) were included to represent information 
on the Indochinese peninsula. 

All individuals gave information on their familial birth- 
place, consented to participate in this project, and the 
study and the informed consent protocol was approved by 
the ethics committee of the Mackay Memorial Hospital in 
Taipei (Taiwan). 

DNA was extracted from blood or saliva using QIAmp 
DNA kit (Qiagen inc. USA) and ORAgene DNA self- 
collection kit (DNA Genotek Inc. Canada) respectively, 
with minor modifications to the procedure recommended 
by the manufacturer. 

Comparative data with other populations of south China, 
ISEA and Oceania were taken from the literature and 
comprised Yueh/Daic-speaking populations [27,30,36-38], 
Malayo-Polynesians and Papua New Guineans [39], and 
Han Chinese [7,8,12,29,30,40,41]. These datasets are shown 
in Additional file 1: Table SI. 

Genetic analysis 

Using the PCR-SSP method, a combination of 81 markers, 
the majority of which are slowly evolving SNPs [8], were 
used to genotype in a hierarchical fashion the NRY of 



1660 individuals. Altogether, these 81 markers define 
68 Y-SNP haplogroups: CDEF-M168, C*-M216, C1-M8, 
C2-M38, C3-M217, C4-M247, C5-M356, C6-P55, DE- 
M145, D*-M174, D1-M15, E-M96, F*-M89, G-M201, , H- 
M69, H1-M52, Hla-M82, H2-Apt, I-M170, J-P209, K-M9, 
K1-M147, L-M20, M-P256, NO-M214, N-M231, Nl- 
LLY22g, Nla-M128, Nlc*-Tat, Nlcl-M178, 0-M175, 
01a*-M119, Olal*-P203, Olala-MlOl, Ola2-M50, 
02*-P31, 02a*-PK4, 02al*-M95, 02ala-M88, 02b-SRY465, 
03*M122, 03a*-M324, 03al*-KLl, 03ala-M121, 03alb- 
M164, O3alc*-IMS-JST002611, O3alcl-P103, O3a2*-P201, 
03a2a-M159, 03a2b*-M7, 03a2bl-M113, 03a2c*-P164, 
03a2cl*-M134, 03a2cla-M133, 03a2clal-M162, 03a2clb- 
P101, O3a3-M300, 03a4-M333, PQR-M45, Q-M242, 
R*-M207, R1-M173, Rlal*-SRY10831.2, Rlala-M17, 
Rlb-M343, R2a-M124, S-M230, and T1-M70. Chro- 
mosomes were assigned to haplogroups according to 
the improved methods of Karafet et al. and Yan et al. 
[8,42], and to the classification provided by International 
Society of Genetic Genealogy for the Y Chromosome Con- 
sortium (YCC) [43,44]. In this study, microsatellite 
MSY2.2, usually used to differentiate haplogroup O* from 
OP, was not used on all samples; accordingly the two 
haplogroups were treated conjointly as O'VOl* and are 
referenced as OVOP-M175 (xM119, P31, M122) in the 
analyses. 

PCR was performed in 10 ul reactions containing 0.5 
U AmpliTaq Gold polymerase (Applied Biosystems), 
10 mM Tris-HCl (pH 8.3), 50 mM KC1, 2 mM MgCl 2 , 
0.1 mM each of the four deoxyribonucleotide triphos- 
phates, 0.2 uM each of forward/reverse primers [45] and 
50-100 ng genomic DNA. Thermal cycling conditions 
were 94°C for 10 min, and then 14 denaturing cycles at 
94°C for 20 sec, primer annealing at 63-56°C using 0.5°C 
decrements and extension at 72°C for 1 min, followed by 
20 cycles at 94°C for 20 sec, 56°C for 20 sec, 72°C for 
1 min and a final 5-min extension at 72°C [46]. Prior to 
sequencing, excess dNTP and primers were removed from 
the PCR products by pre-treating with shrimp alkaline 
phosphatase (Sap) and Exonuclease I (Exo I) enzymes 
(USB Product number US 70995 pre-sequencing Kit, 
Pharmacia) following the conditions recommended by the 
manufacturer (37°C for 30 min and 80°C for 15 min re- 
spectively). Sequencing was performed on both strands 
using the DyeDeoxy Terminator Cycle Sequencing Kit 
(Applied Biosystems) according to manufacturer recom- 
mendations. Purification on a G50 sephadex column was 
performed before the final run on an automated DNA se- 
quencer (ABI Model 377). 

All samples were further subjected to genotyping with 
17 microsatellites (DYS19, DYS385I, DYS385II, DYS389I, 
DYS389II, DYSS390, DYS391, DYS392, DYS393, DYS437, 
DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, 
and Y GATA-H4) and analyzed using Y-filer kit (Applied 
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Sample numbers as in Figure 1. 

2 Detailed information and IS0639-3 codes can be searched online version of "Ethnologue" by: Lewis, M. Paul [35]: http://www.ethnologue.com/. 
*Not clearly defined as plain tribes. 
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Figure 1 Geographic map showing the locations of the samples of Fujian, Taiwan, the Philippines and Indonesia genotyped in this 
study. Group numbers identifying sampling locations are described in Table 1, and color codes are : Black squares = Mainland Han, Dark blue 
squares = Taiwan Han, Sky blue circles = Taiwan Plain tribes, Red circles = Taiwan Mountain tribe Aborigines, Orange triangles = Philippines, Green 
stars = Indonesia, Yellow hexagons = Indochina (Indochina). The Wallace's line (black dotted line) separates Eastern ISEA from Western ISEA. Locations 
without a number indicate groups obtained from additional literature (Additional file 1: Table S1), namely: Brown circle = Yueh/Daic/Hainan in south 
China, Grey = Madagascar, White = Solomon island, Pink = Japan and Korea, and Black with red frame = North China. 



Biosystems). In brief, PCR products were mixed with 
GeneScan 500LIZ (Applied Biosystems) as internal size 
standard and analyzed by capillary electrophoresis with 
an ABI Prism 310 genetic analyzer (Applied Biosystems) 
in the mode of standard fragment analysis protocol. 
Genetyper 2.5.2 (Applied Biosystems) was used for allele 
scoring. For all statistical and network analyses, we 



used data from DYS389II by subtracting DYS389I from 
DYS389II [46]. 

Statistical and population genetic analysis 

The Y-chomosome SNP dataset was used to obtain fre- 
quency distributions of haplogroups (clades and sub-clades) 
in the population samples by mere counting, and the 
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unbiased gene diversity index, h, and its standard error 
were calculated using the formulas given by Nei [47]. 

Contour maps of interpolated spatial frequency variations 
of the most relevant clades in East Asia were constructed 
by applying the Kriging algorithm in Surfer 8.0 (Golden 
software). Similarly, the internal diversity of each relevant 
Y-SNP clade was estimated on the basis of its STR variation 
by computing the average variance in repeat size over STR 
loci (the rho statistic), following the method of Zhivotovsky 
et al. [48], and the spatial variation of this statistic was also 
interpolated with Surfer 8.0. 

Thirteen Y-STR loci (DYS19, DYS389 I/II, DYS390, 
DYS391, DYS392, DYS393, DYS385a/b, DYS437, DY438, 
DYS439, DYS635 and Y GATA-H4) were used to estimate 
the age of the variation within each SNP haplogroup fol- 
lowing the modified coalescence method of Zhivotovsky 
et al. [48-50], assuming a generation time of 25 years and 
a single mutation rate of 0.00069 per locus per generation. 
However, the results were very similar to those obtained 
only with the seven most commonly used STRs in the 
data retrieved from the literature. Accordingly, the same 
seven Y-STR loci, or only five of them in the worst cases 
(noted in italic here after), were used to accommodate the 
literature data (DYS19, DYS389 /, DYS389 II, DYS390, 
DYS391, DYS392, and DYS393). Age estimates of STR 
variation of haplogroups comprised less than 10 individ- 
uals were also calculated but results are to be considered 
with caution. 

Gene contribution estimates between populations were 
inferred by two methods; firstly the coalescent approach 
of Admix version 2.0, which infers contributions from 
parental populations according to STR haplotype frequen- 
cies considering each haplotype as an allele of the same 
locus [51,52], and secondly using the analysis of shared 
STR lineages (LS) between populations [53] to infer those 
contributions as well as to determine the unshared gene 
portion in the hybrid population. The Fujian and Taiwan 
Han samples were pooled so as to constitute a putative 
Han parental contributor and a pool of most samples of 
Austronesian speaking groups, namely TwMtA, Filipinos 
and Western Indonesians (i.e. all samples from Borneo, 
Sumatra and Java) as the other putative parental con- 
tributor. Seven STRs were used in these analyses (i.e. 
DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 
and DYS393. 

Both the SNP and STR datasets were used to perform 
AMOVA analyses of population structure using the 
Arlequin 3.5.1.2 software [54]. Multidimensional scaling 
(MDS) analysis was performed to represent patterns of 
genetic relationships between all groups in our data set. A 
Reynolds distance matrix was obtained from frequency 
distributions of SNP haplogroups with Arlequin, and used 
as input for MDS analysis. MDS plots were constructed 
using XLSTAT software version 7.5.2 [55]. 



Median joining (MJ) networks of Y-STR haplotypes 
(defined by DYS19, DYS389I, DYS389II, DYS390, DYS391, 
DYS392 and DYS393) for relevant SNP haplogroups and 
sub-haplogroups of the O clade were constructed from re- 
duced median joining networks using the NETWORK 
4.1.0.6 software [56]. We used a reduction of one and locus 
specific weights based on the relative mutation rates of the 
following Y-STR loci: DYS19 (weight of 5), DYS389I (4), 
DYS389II (3), DYS390 (2), DYS391 (2), DYS392 (20), and 
DYS393 (20) [57-60]. 

Results 

Y chromosome diversity and its geographic distribution 

The frequency distribution of Y-chromosome SNP hap- 
logroups detected in Taiwan, ISEA and Indochina is 
shown in Figure 2 and reported in detail in Additional 
file 1: Table S2, and are summarized is Additional file 1: 
Figures SI and S2. Additional file 1: Figure S3 displays 
the variation of diversity measures according to latitude 
among TwMtA. The interpolation contour maps resulting 
from applying the Kriging method both to the frequency 
distributions of Y-SNP clades and their internal STR diver- 
sity are shown side-by-side in Figure 3. Forty seven out of 
the 81 genotyped Y-SNPs were observed in the derived 
state [61], thus defining 47 haplogroups observed in our 
samples, that belong to major clades C, D, F*, G, H, J, K*, 
L, N, O, P* Q and R. MJ networks of major Y-SNP clades 
in SEA are shown in Figure 4. Age estimates of the diver- 
sity of these clades, based on the assumption that muta- 
tions in haplotypes accumulate in situ (i.e. no gene flow), 
are reported in Table 2 and Additional file 1: Table S3. 

The most prominent clade in Taiwan, 0-M175 as a 
whole, represents almost 90% of Y chromosomes among 
the TwHan, about 95% among the TwPlt and more than 
99% among the TwMtA (Additional file 1: Figures SI 
and S2). Only one representative of the basal 0*/01*- 
M175 (xM119, P31, M122) was seen in each of the Luzon 
(Philippines), Fujian and TwPlt samples (Figure 2 and 
Additional file 1: Table S2). All other haplogroups of the 
O clade were observed at the derived state for the Ml 19, 
P31 and M122 markers (Figure 2, Additional file 1: Table 
S2, and Additional file 1: Figures SI and S2). 

Haplogroup 01a*-M119 (Figure 2 and Additional file 1: 
Table S2) is seen throughout Batan (42%), the Philippines 
(4% to 33%) and Indonesia (4% to 18%). It has a patchy 
distribution among TwMtA (3-33%) and only some 
southern TwMtA show frequencies greater than 10% 
(i.e. Puyuma, Paiwan and Yami). 01a*-M119 was not 
observed in our Amis sample, neither in the Bunun, 
Saisiyat and Thao and has a low frequency among 
TwHan (1.4% to 2%). 01a*-M119 was neither observed 
outside Taiwan, among the Kalimantan in Borneo nor 
in Sumatra. However, the contour map interpolation 
(which includes published data, see Additional file 1: 
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Figure 2 Phylogenetic tree of 47 Y-chromosome haplogroups seen in this study (shown in boldface) and hierarchically defined using 
81 slowly evolving binary markers (68 in the Figure). The marker names are shown along the branches, and haplogroup names are shown 
on the right side according to ISOGG Y-DNA Haplogroup Tree 201 1. Potentially paraphyletic undefined subgroups are distinguished from recognized 
haplogroups by the asterisk symbol. Haplogroups tested for but not seen in this study are shown in (italic). See Additional file 1 : Table S2 for a more 
detailed frequency table. 
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Figure 3 Spatial distributions of the 01 , 02 and 03 clades using haplogroup frequency and associated STR diversity (rho statistic). 

Maps are based on data from Additional file 1: Tables SI and S2 and from literature data [12,27,30,36,39,40,62]. Panels are labeled according to 
ISOGG2011 [8,42,43]. Arrows symbolize dispersals and gene flow, and stages (B, C and D) are according to Karafet et al. [26] (see Discussion 
section). The age at the beginning of arrows represents the likely time of origin of the haplogroup as estimated from its STR diversity (Table 2 
and Additional file 1: Table S3), and arrow colors represent time of dispersal (red for Paleolithic, blue for Neolithic/Austronesian expansion and 
yellow for contemporary historical times). 



Table SI) indicates its presence in Western Sumatra, 
towards the Indian Ocean (Figure 3). The internal STR 
diversity of 01a*-M119 decreases gradually from mainland 
China towards the south, although a second, somewhat 
lower peak of diversity is observed around the Moluccas. 



The MJ network of 01a*-M119 (Figure 4) clearly differ- 
entiates TwMtA and TwPlt from Indonesia (Indonesian 
STR haplo types are all included in the lower left reticula- 
tion of the 01a*-M119 network), whereas most Filipinos 
01a*-M119 haplotypes are shared or very similar to those 
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(See figure on previous page.) 

Figure 4 Median joining Networks for the whole dataset and published East Asian Y haplogroup lineages based on seven Y-STR loci 
[7,8,12,29,30,41]. Sizes of circles are proportional to Y-STR haplotype frequency; lines between circles (links) represent mutation differences. 
Green links in 01 a2 and 02a1a represent ancestry sharing between Solomon and Madagascar (note that only TwMtA 01 a2 is included in this 
pathway. Colors within circles indicate populations. 



found among TwMtA and TwPlt Age estimates based on 
the amount of molecular variation for 01a*-M119 
were higher among TwMtA (19.96 ± 5.93 Kya) and the 
Philippines (20.36 ±5.65 Kya) than in Indonesia (4.14 ± 
2.67 Kya) (Table 2 and Additional file 1: Table S3). 

Olal*-P203, derived from 01a*-M119 (Figure 2 and 
Additional file 1: Table S2), is the most common hap- 
logroup among the northern TwMtA and the Tsou 
(90%), while among southern TwMtA it represents about 
half of the Y chromosomes observed. It is also quite com- 
mon in several TwPlt (e.g. Pazeh, Ketagalan and Siraya in 
various locations), but less frequent in the Philippines 
(13.7%) and Indonesia (16.3%), although it is observed in 
36% of the Kalimantan in Borneo. It is also commonly 
seen in Han (22% and 14.2% in Fujian and TwHan, re- 
spectively), but uncommon in Thailand (2.7%), and was 
not observed in the Vietnamese (Hanoi) and the Akha. 
Accordingly, the contour map of the frequency variation 
of Olal*-P203 shows two main locations of high 
frequency, in Taiwan and southwestern Borneo, and 
decreasing frequencies from these locations towards the 
Philippines, whereas the contour map of its internal 



diversity is more complex (Figure 3). The Olal*-P203 
MJ network has a marked star-like shape, with a central 
frequent haplotype detected throughout mainland and 
insular SEA (Figure 4). Interestingly, haplotypes observed 
in the Atayal branch off from this central node to form a 
distinct network, suggesting a founding event and a period 
of isolation in this population. The intriguing observation 
of Tsou Olal*-P203 haplotypes deriving from those of 
Atayal evidenced by the seven STRs MJ network is no 
longer sustained when using more than 13 Y-STRs, as 
the two groups then split at an earlier founding level 
(13 Y-STR MJ network, data not shown). Age estimates 
of Olal*-P203 in TwMtA (16.3 ± 5.9 Kya) and the 
Philippines (16.05 ±3.6) were higher than estimates ob- 
tained for TwHan, Fujian or Western Indonesia (8.59 ± 3.3, 
11.6 ±6.1 and 7.28 ± 4.0, respectively) (Table 2 and 
Additional file 1: Table S3). Haplogroup Olala-MlOl 
[46] is a para-group of Olal*-P203 (Figure 2). It was 
tested on all Olal*-P203 in our dataset but was not seen. 

Haplogroup Ola2-M50 (or Ola2-M110, Figure 2 and 
Additional file 1: Table S2), a sister haplogroup of 
Olal*-P203, is frequently observed among southern 



Table 2 Age estimates of the diversity accumulated in Y-haplogroups using a 7 Y-STRs dataset 

Age in Kya (n) 2 
±SE 



Haplogroup 1 


Taiwan 


Plain 


Taiwan 


Philippines 


Western 


All 




Han 3 


tribes 


mountain tribes 


Batan only 


Philippines only 


Indonesia 


populations 


01a*-M119 


16.56 (5) 


13.38 (7) 


19.96 (28) 


10.82 (10) 


20.36 (15) 


4.14 (5) 


24.67 (80) 




±4.39 


± 2.575 


± 5.53 


± 2.85 


± 5.65 


± 2.67 


± 5.31 


O1a1*-P203 


8.59 (50) 


10.74 (93) 


16.29 (210) 


9.83 (10) 


16.05 (15) 


7.28 (32) 


13.45 (437) 




± 3.27 


±6.6 


± 5.88 


± 3.51 


± 3.57 


± 4.00 


±4.90 


O1a2-M50 


na 4 


8.63 (30) 


1 7.54 (67) 


na 


12.48 (16) 


7.06 (11) 


14.23 (128) 






± 3.18 


± 3.56 




± 5.95 


± 2.63 


± 3.73 


02a1*-M95 


10.35 (15) 


8.63 (30) 


8.05 (9) 


na 


14.73 (13) 


14.74 (59) 


14.23 (128) 




± 2.28 


± 3.18 


± 2.32 




± 7.64 


± 2.86 


± 3.73 


O3a1c*-IMS-JST002611 


19,41 (48) 


16,51 (42) 


na 


0.86 (6) 


na 


28.47 (4) 


14.23 (128) 




±6.62 


± 4.74 




± 0.86 




± 13.76 


± 3.73 


03a2c* Ml 64 


19.55 (9) 


10.35 (13) 


13.62 (19) 


na 


13.37 (24) 


6.90 (14) 


13.56 (65) 




± 7.73 


± 3.48 


± 5.12 




± 2.82 


± 2.10 


± 3.59 


03a2c1a-M133 


12.40 (38) 


8.70 (44) 


8.63 (6) 


na 


na 


na 


1 1 .40 (94) 




± 3.18 


± 3.19 


± 4.13 








± 5.11 



^he seven Y-STRs used are: DYS19, DYS389 I, DYS389 II DYS390, DYS391, DYS392, and DYS393. 

2 Kya: thousands of years; ( n): sample size; SE : standard error. 

3 Taiwan Han include Minnan, Hakka and MiscHan. 

4 na: age was not estimated if less than 5 individuals bear the haplogroup. 



Trejaut et al. BMC Genetics 2014, 15:77 
http://www.biomedcentral.com/1471-21 56/1 5/77 



Page 12 of 23 



TwMtA (from 18% to 28%) and in the Bunun (61%), 
variable in the Philippines (from 3% to 16.7%), but rather 
rare in Western Indonesia (Java and Sumatra, -5.3%). 
Note however that a high frequency of this haplotype has 
been reported in South Nias (-80%), an island of the 
Indian Ocean in front of Sumatra [26,63,64]. Ola2-M50 
was not seen in our dataset in the Yami and was rare in 
TwHan, Fujian, Thailand, Vietnam and Malaysia. How- 
ever, it has been reported as prevalent among a few 
Daic-speaking groups from southern China and Hainan 
(3% to 25%) [26,65], thus explaining the decrease in 
frequency and molecular variation of Ola2-M50 from 
Taiwan to the Philippines and Indonesia (Figure 3). Age 
estimates of diversity of this haplogroup decrease from 
Taiwan (17.54 ±3.56 Kya) to the Philippines (12.5 ± 5.9 
Kya) and Western Indonesia (7.06 ± 2.63 Kya) (Table 2 
and Additional file 1: Table S3). The star-like shape of 
the Ola2-M50 MJ network places in the central nodes 
the STR haplotypes observed among TwMtA, TwPlt, 
Filipinos and Indonesians (Figure 4). We also notice 
that the haplotype continuity observed along the 
Taiwan-Philippines-Indonesian pathway can be traced 
further away toward Madagascar and the Solomon 
Islands [7,66,67]. 

Haplogroup 02 is comprised of two derived branches 
(Figure 2). The first branch includes subtypes 02*-P31 
observed in Han (4%), TwPlt (5.7%), the Philippines 
(Luzon, 4.7%) and Indonesia (3.25%), and 02a*-PK4, 
02al*-M95 and 02ala-M88 mostly seen among groups 
speaking Daic languages in South China and Indochina, 
and in Indonesia (Additional file 1: Table S2). The second 
02 branch includes subtypes of 02b-SRY 465 that are only 
found among Korean, Japanese and northern Chinese 
[11,68]. The distribution of the recently defined 02a*-PK4 
[42] was reanalyzed by two separate laboratories (P. 
Underhill, Stanford University School of Medicine, CA, 
USA, personal communication and this study) using a 
total of 105 individuals (data not shown). The A to T 
transversion of PK4 [8,42] was confirmed to be ancestral 
to the M95 SNP (Figure 2). In our dataset, haplogroups 
02al-M95 and 02ala-M88 occurred principally in 
Indochina (-20% and 20% for 02al-M95 and 02ala-M88, 
respectively, in Thailand, and 8% and 25% in Vietnam) and 
Indonesia (29% and 3%), and had a scattered distribution 
between Fujian, TwHan and TwPlt, with rather low fre- 
quencies. Among the TwMtA, the Yami (10% and 3.33% 
for 02al-M95 and 02ala-M88, respectively) and Bunun 
(0% and 37.5%) were the only Mountain tribes bearing 
haplogroup 02. While a study reported the presence 
of 02al-M95 and 02ala-M88 in Mindanao [7], only 
02ala-M88 (3%) was seen in our Philippines dataset 
(6.45% in Visayas, 3.42% on average for all the Philippines). 
We notice that the most frequent 02ala-M88 Y-STR 
haplotype in the Bunun is also observed in China, among 



Daic-speaking populations, in Indochina and in Indonesia, 
but not among Solomon islanders, TwPlt or Han (Figure 4). 
In fact, Bunun 02ala-M88 lineages do not belong to 
the MJ branch observed in the Solomon Islands or 
Madagascar. 

Eleven 03 lineages, out of the 19 described by Karafet 
et al. [8,26,39] were observed in this study (Figure 2 and 
Additional file 1: Table S2). The most widely distributed 
sub-haplogroups of 03 in our dataset are 03alc*-IMS- 
JST002611, O3a2*-P201, 03a2b*-M7 and 03a2c*-P164. 

A rather high prevalence of haplogroup 03alc*-IMS- 
JST002611 is observed in Fujian Han (-25%) and in the 
Taiwanese Minnan and Hakka (13% and 21%), as well as 
among the TwPlt (12%), whereas it occurs at an extremely 
low frequency in TwMtA (it was only observed in the 
Yami, at 3%), in the Philippines (0.8%) and in Indonesia 
(2%) (Figure 2 and Additional file 1: Table S2). Its high 
frequency among the Ivatan (25%) of the Batan archipel- 
ago north of the Philippines is however associated with a 
low age estimate (0.9 Kya) (Table 2 and Additional file 1: 
Table S3). The MJ network of this haplogroup is charac- 
terized by the presence of Han Y-chromosomes at most 
founding nodes (Figure 4). Accordingly, a location of high 
frequency of haplogroup O3alc*-IMS-JST002611 emerges 
from the contour map, centered in the southeastern 
Chinese coast, with no clear clinal pattern of frequency 
variation across SEA (Figure 3). The contour map of 
O3alc*-IMS-JST002611 diversity displays a high peak in 
Indonesia, but this is due to a few (only 4), molecularly 
distant STR-haplotypes which translates to an older but 
less precise age estimate (28.47 ± 13.76 Kya, Table 2 and 
Additional file 1: Table S3). 

Other commonly observed haplogroups in the 03 series 
are the paragroups of the O3a2*-P201 branch (Figure 2 
and Additional file 1: Table S2). Note that mutation P201 
has not been tested in the Daic-speaking groups of China 
and Hainan [27], in which the resolution of 03* included 
03*, 03a*, 03al*, 03alb, 03alc, 03a2*, 03a2a, 03a2c*, 
03a3 and 03a4 [42]. Here O3a2*-P201 did not exceed 2% 
in any pooled population group of our data set, and 
reached 4.35% in the Puyuma. These low frequencies 
and the high age estimates seen in Taiwan (i.e. Puyuma) 
(15 ± 3 Kya) and Western Indonesia (17 Kya ± 2 Kya; 
Additional file 1: Table S3) hint late to recent gene flow. 
The Kriging contour maps suggest two paths linking the 
patterns of diversity of O3a2*-P201, namely from main- 
land SEA towards Taiwan and from mainland SEA to 
Indonesia through the Indochinese peninsula (Figure 3). 
However, these latter results must be viewed with caution 
given the general low frequency of O3a2*-P201 over the 
whole region. 

Haplogroup 03a2b*-M7 is uncommon in the Fujian Han 
(less than 2%), in Taiwan (only observed at low frequencies 
in TwHan and TwPlt, not observed among TwMtA) and 
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the Philippines, but was more frequent in Indonesia 
(8.54%) (Figure 2 and Additional file 1: Table S2). In this 
study we observe it also in Indochina (12.5% and 6.7% 
in Vietnam and Thailand, respectively), as well as in 
Malaysia (although this latter region is represented by a 
sample of only 8 Y chromosomes). Age estimates of 
03a2b*-M7 diversity for Indonesia (22.18 ± 6.27 Kya) 
and TwPlt (16.56 ± 4.67) are somewhat older than that 
for Thailand (12.42 ± 3.45) (Additional file 1: Table S3). 
However, as already noted, these estimates are based on 
the assumption that no diversity is introduced in a clade 
by gene flow between populations. However, the scattered 
and generally derived location of Indonesian, TwPlt and 
Filipino haplo types in the 03a2b*-M7 MJ network is con- 
sistent with repeated introductions of these haplotypes by 
gene flow from the mainland into the islands (Figure 4). 

Haplogroup 03a2c*-P164 is frequently observed among 
TwMtA on the east coast of Taiwan (Amis 35.9%, Puyuma 
13%, Taroko 5%), in the Philippines (from 8% to 50%) 
and in western Indonesia (17.8%, Kalimantan 20%, and 
Sulawesi 11.8%), while it is rather uncommon in Indochina 
and among the Han (Figures 2 and 3, and Additional file 1: 
Table S2). Interestingly, the star-like MJ network of 
03a2c*-P164 shows core nodes mostly comprised of Han, 
Korean/Japanese and Indochinese groups (Figure 3) and 
radiating sectors mainly comprised of either Korean/ 
Japanese and Tibet, or Indochinese, or delineating TwMtA. 
This structure supports the dispersal paths put forward in 
the contour map (Figure 4). 

In turn, derived haplogroups 03a2cl*-M134 and 
03a2cla-M133 are rarely seen among TwMtA. They 
often occur together and are more frequent among 
Han and TwPlt as well as in Sumatra, Borneo and the 
Visayas in the Philippines. Age estimates of the diversity 
of these two haplogroups all point to the early Neolithic 
period (i.e. < 10,000 Kya), with the notable exception of 
estimates for the Han groups (Table 2 and Additional 
file 1: Table S3). Accordingly, the contour maps of 
03a2cl*-M134 display a general location of higher fre- 
quency and diversity related to Han populations, in 
mainland SEA (Figure 3). 

Finally, haplogroups C, D, F*, H, K*, N, P*, Q and R 
haplogroups were observed at low frequency in the re- 
gion, with patchy distributions (Figure 2 and Additional 
file 1: Table S2). 

Patterns of differentiation of SEA populations 

The plot of the multidimensional scaling (MDS) analysis 
of pairwise Reynolds genetic distances obtained from 
high definition Y-SNP haplogroup frequencies in our 
dataset is shown in Figure 5. Most population samples 
are located in the center of the plot, with Austronesian 
speaking groups from Indonesia and the Philippines 
surrounded by populations from Vietnam, Thailand and 



Malaysia (differentiating towards the upper-left part of the 
plot), most TwMtA differentiating towards the bottom- 
right, and the Han, both from the mainland and from 
Taiwan, differentiating towards the upper-left along the x 
axis. The heavily sinisized TwPlt surround Han popula- 
tions with Yulin and Papora on the left part of the Han, 
and the Pazeh and Siraya (from Tainan, Pingtung, and 
Hwalian) getting close to southern TwMtA and most 
likely representing less sinisized groups than other TwPlt. 
A relationship with the frequencies of O clades in these 
populations (Additional file 1: Figures SI and S2) is 
evidenced by the fact that 02 haplogroups prevail in 
Indochinese populations located in the upper-left quad- 
rant of the MDS plot, whereas Ol haplogroups become 
more and more frequent in Indonesian, Filipino, south 
TwMtA and north TwMtA populations, which differenti- 
ate towards the bottom of the plot. Bunun, Akha, Atayal 
and Tsou are clearly four distinct outliers in the plot, 
due to their particularly low diversity. Indeed, only 3 
haplogroups were observed among 56 Bunun chromo- 
somes, of which Ola2-M50 at 61% and 02ala-M88 at 
37.5%, and this translates in a gene diversity level of 
only 0.49 (Additional file 1: Table S2). Also, only 3 hap- 
logroups were observed among 41 Tsou and 52 Atayal 
chromosomes, of which Olal*-P203 at 90% (amongst the 
highest frequencies observed in ISEA for this haplogroup), 
with a gene diversity of 0.18 for the two groups. As for the 
Akha population, its differentiation from all other SEA 
populations is driven by the unique presence of hap- 
logroup Q-M242 (56%), which was mainly unobserved in 
our dataset, except for a very low frequency (<1%) in 
TwHan. The axis of differentiation constituted by TwMtA 
in the lower-right part of the MDS plot is mostly driven 
by the northern tribal groups (Taroko, Thao, Saisiyat and 
Atayal) and underlines their low gene diversity (h is 0.10, 
0.23, 0.23 and 0.18, respectively) due to the high frequency 
of haplogroup Olal*-P203 (Additional file 1: Figure SI 
and Additional file 1: Table S2). 

In a second MDS analysis, we included earlier published 
data (Additional file 1: Table SI) to investigate the genetic 
relationships of populations represented in our dataset to 
other East Asian groups. The resulting 2-dimensional plot 
is provided in Additional file 1: Figure S4. The range and 
variation of Y chromosome diversity present in Taiwan is 
well exposed in this second MDS, which echoes the first 
one reported in Figure 5. Taiwanese tribes, starting first 
with the Amis, then the southern TwMtA tribes and fi- 
nally the northern TwMtA tribes spread away towards the 
lower-right part of the plot, from a lower central cluster 
principally comprising TwPlt, the Philippines, Western 
Indonesia, some Han from southern China and a few 
Daic-speaking groups. The latter also form a second axis 
of differentiation including mainly Daic-speaking groups 
from China, as well as Indochinese populations that 
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spreads towards the lower right-part of the plot. With re- 
spect to Figure 5, the outlier location of the Bunun is again 
indicative of a high level of genetic drift in this population, 
consistent with its low level of gene diversity. Interestingly 
enough, however, the position of the Bunun in the MDS, at 
midway between the axis of differentiation of TwMtA and 
that of the Daic-speaking groups from southern China, is 
explained by the high frequencies of haplogroups 01a2- 
M50 (67%, the highest frequency seen in Taiwan) and 
02ala-M88 (37%), both of which are commonly found 
among Daic populations in south China [27]. 

Admixture 

Contribution from two putative parental groups, namely 
"ancestral Han" and "ancestral Austronesians", to a hybrid 



population, respectively, TwMtA, TwPlt, Filipinos and In- 
donesians, estimated by using Admix 2.0 and through the 
STR lineage sharing method (LS) are reported in Table 3. 
Frequencies of Y-STR lineages for Admix and LS were 
obtained in the background of their respective Y-SNP 
haplogroup. 

Results between Admix and LS (Table 3) correlate well 
and suggest that all putatively hybrid groups but the TwPlt 
received an important contribution from "ancestral 
Austronesian" speaking populations. In turn, TwPlt Y-SNP 
haplogroups are predominantly shared with Han popu- 
lations (Figure 2 and Additional file 1: Table S2) and 
expectedly the "ancestral Han" contribution estimated 
with Admix and LS is high (62% and 79% respectively). 
However, the effective relative frequency of parental Y-STR 
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Table 3 Gene contribution estimates between populations using variation at 7 Y-chromosome STR loci 

Putative parental groups Admix 2.0 Lineage sharing (LS) 

Hybrid populations Hybrid populations 

Philippines TwMtA Indonesia TwPIt Philippines TwMtA Indonesia TwPIt 
(n = 144, (n = 350, (n = 222, (n = 347, (n = 144, (n = 350, (n = 222, (n = 347, 
hp =127) hp =156) hp = 194) hp = 206) hp = 127) hp = 156) hp = 194) hp = 206) 

Han All (n = 299, hp = 233) 0.38 ± 0.08 0.35 ± 0.03 0.30 ± 0.08 0.62 ± 0.08 0.1 3 (0.04) 0.08 (0.03) 0.1 4 (0.06) 0.79 (0.1 2) 

Austronesian pool (n = 71 6, hp = 458) 0.62 ± 0.08 1 .35 ± 0.07 0.70 ± 0.085 0.38 ± 0.08 0.87 (0.26) 0.92 (0.33) 0.86 (0.39) 0.21 (0.03) 

Using Admix 2.0 [52,53], the time in years to the admixture event and the mutation rate per year were set to zero. The contribution from the putative parental 
population is expressed by the estimator of admixture my (±standard error). Note that the sum of parental contribution sums to one. The greatest estimate of my 
determines the greater contributor [69]. For lineage sharing [53] the contribution also sums to one. ( ) The observed relative frequency of the contribution to the 
hybrid population is indicated in brackets, hp = number of 7xSTR haplotypes. 



haplotypes contributed by each parental group, as esti- 
mated through the LS method (shown in brackets in 
Table 3), reached a lower level than anticipated, with 
only 12% of TwPIt Y-STR variation attributed to the 
"ancestral Han" gene pool and only 3% to the "ancestral 
Austronesian". This indicates that 79% (i.e. 44/ 209) of 
the remaining Y-STR variation is unique to the TwPIt. 
Such a large amount of unshared variation could only 
have been acquired after a long period of settlement in 
isolation from other groups, much longer than 400 years, 
date at which Han Chinese (Minnan and Hakka) migrated 
to Taiwan from southeast China [1]. Actually, on the basis 
of MJ networks constructed using only the unshared 
haplotypes showing continuity in the background of 
their respective haplogroups we tentatively estimated 
this period to ~ 3 to 8 Kya. 

Analysis of molecular variance 

Analysis of molecular variance (AMOVA) was performed 
first using Y-SNP haplogroups and then Y-STRs haplo- 
types (Table 4). The TwMtA group shows the lowest SNP 
variance within populations (70.3%), consistent with the 
low gene diversity observed in this group of populations, 
ranging from 0.10 to 0.7, or averaging to 0.60 ± 0.02 over- 
all (Additional file 1: Table S2). Accordingly, the TwMtA 
also display the highest SNP variance due to differenti- 
ation between tribes (29.7%) thus explaining the scattered 
location of the Taiwan tribes observed in the MDS plot 
(Figure 5 and Additional file 1: Figure S4). 

In contrast to TwMtA, high SNP variation between 
individuals within populations is found for Western 
Indonesia (94.30%), the Philippines (98.10%) and TwPIt 
(96.24%), and the levels of differentiation among popula- 
tions within groups are correspondingly low, ranging from 
non-significant for Filipino populations to less that 6% 
among western Indonesians, even though populations 
in Western Indonesia and the Philippines are broadly 
dispersed over many isolated regions of ISEA. On another 
other hand, the high variance seen within TwPIt popula- 
tions (96.24%) was expected as they are heavily admixed 
groups. 



When testing genetic differentiation among four sep- 
arate geographical groups, namely TwMtA, Philippines, 
Western Indonesia and Han (representing mainland 
China), or three distinct language family assortments 
(Formosan, Malayo-Polynesian and Sino-Tibetan), we 
observe that the variation between groups is large (18% 
and 18.55% respectively, P < 0.001) and did not show much 
difference between geographic or linguistic groupings. This 
pattern remains the same when including Indochina as a 
fifth geographic region or as a fourth linguistic group. 

Very little changes from the Y-SNP results were observed 
when using Y-STRs to perform AMOVA computations on 
the individual groups of populations, with the variance 
due to differences between individuals within populations 
being extremely high for TwPIt, Philippines and Western 
Indonesia (above 96%) and much lower for TwMtA (68%). 
However, with Y-STRs we observe that the component of 
the variance due to differences between groups of popula- 
tions, although significant, is always lower than that due 
to differences among populations within these groups, 
thus indicating that, contrarily to Y-SNPs, Y-STRs fail to 
detect a population structure associated with geography 
or linguistics. 

Discussion 

Our data, obtained through the genotyping of 81 high 
Y-SNP definition markers to determine the fine distribu- 
tion of Y-chromosome haplogroups O in ISEA, revealed a 
high level of population structure in the region including 
Taiwan, the Philippines and Indonesia, mainly defined by 
variable distributions of haplogroups belonging to the O 
clade [9,27,70]. We also genotyped 17 Y-specific STR 
markers, in order to gain insights into the distribution 
of the Y-chromosome variation in this region of the 
world. Since only one population from the mainland east 
coast of China (Fujian Han) was analyzed, data from other 
populations of SEA were obtained from the literature 
(Additional file 1: Table SI). 

The O clade, to which belong 95% of Y-chromosomes 
in Taiwan, reflects an ancestral relationship to the early 
modern human settlement in East Asia [28-30] with 
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Table 4 Analysis of Molecular variance (AMOVA) using Y-SNPs haplogroups and 17 Y-STRs for various groupings of 
populations 

Grouping Variation seen 



Among Within 
populations populations 



Y-STR 





Variance (%) 


Fst 


Variance (%) 








Han (East China) 


1.17 


0.01 


98.83 








TwMtA (9 tribes) 


29.70* 


0.30 


70.30 








TwPIt (Siraya, Pazeh) 


3.76* 


0.04 


96.24 








Ph (Luzon, Mindanao, Visayas) 


1.90 


0.02 


98.10 








Western IN (Borneo, Sumatra, Java) 


5.70* 


0.06 


94.30 








Indochina (Thai, Akha, Vietnam) 


10.41* 


0.10 


89.59 














Variation seen 








Between groups 


Among populations 
within groups 


Within populations 


Among all populations 
considering no grouping 




Variance (%) 


Fct 


Variance (%) 


Fsc 


Variance (%) 


Fst 


Geographic grouping 














TwMtA/Ph/Western IN/Han 


18.09* 


0.05 


6.51* 


0.14 


75.40 


0.18* 


TwMtA/Ph/Western IN/Han/lndochina 


16.83* 


0.17 


7.41* 


0.09 


75.76 


0.24* 


Linguistic grouping 














TwMtA/Ph + Western IN/Han 


18.55* 


0.19 


6.97* 


0.086 


74.48 


0.26* 


TwMtA/Ph + Western IN/Han/lndochina 


17.19* 


0.17 


7.21* 


0.09 


75.59 


0.24* 




Variation seen 










Among 
populations 


Within 
populations 










Variance (%) 


Fst 


Variance (%) 








Han (East China) 


0.18 


<0.01 


99.82 








TwMtA (9 tribes) 


31.78* 


0.32 


68.22 








TwPIt (Siraya, Pazeh) 


3.99* 


0.04 


96.01 








Ph (Luzon, Mindanao, Visayas) 


-0.84 


-0.01 


101.00 








Western IN (Borneo, Sumatra, Java) 


2.34* 


0.02 


97.66 








Indochina (Thai, Akha, Vietnam) 


11.26* 


0.11 


88.73 














Variation seen 








Between groups 


Among populations 
within groups 


Within 
populations 


Among all populations 
considering no grouping 




Variance (%) 


Fct 


Variance (%) 


Fsc 


Variance (%) 


Fst 


Geographic grouping 














TwMtA/Ph/Western IN/Han 


2.74* 


0.03 


12.99* 


0.13 


84.25 


0.16* 


TwMtA/Ph/Western IN/Han/lndochina 


2.98* 


0.03 


12.74* 


0.13 


84.28 


0.15* 


Linguistic grouping 














TwMtA/Ph + Western IN/Han 


2.53* 


0.03 


13.29* 


0.14 


84.16 


0.16* 


TwMtA/Ph + Western IN/Han/lndochina 


2.79* 


0.03 


13.01* 


0.13 


84.20 


0.16* 



*P < 0.05 AMOVA test for Y-STR was based on 17 STR loci TwMtA: Taiwan mountain tribe Aborigines; TwPIt: Taiwan plain tribes; Ph: Philippines (Luzon, Mindanao, 
Visayas); Western IN: Western Indonesia (Borneo, Sumatra Java, Sulaweisi). 



haplogroup, Ol being mostly seen in Southeast China, 
Taiwan and ISEA, haplogroup 02a predominantly confined 
to southeast China and Indochina, and haplogroup 03 
broadly present in mainland China. 



Haplogroup diversity in Taiwan and its relation with the 
Philippines and Indonesia 

The Y-SNP gene diversity among TwMtA was found 
generally low (from 0.1 to 0.7, Additional file 1: Table S2). 
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Except for the presence of haplogroups 02ala-M88 
predominantly seen in Bunun (37.5%) and 03a2c-P164 
in Puyuma and Amis, the low diversity found in TwMtA 
is principally associated with the molecular variation of 
haplogroups within Ol (Figure 2 and Additional file 1: 
Table S2 and Additional file 1: Figure SI). In sharp con- 
trast with the TwMtA, non-aboriginal groups in Taiwan, 
namely Minnan, Hakka and the general mixed Han 
Taiwanese (MiscHan), as well as the TwPlt groups all 
present high levels of gene diversity, and a total of 26 
distinct branches of the O clade (including the Ol, 02 
and 03 clades) were observed in these groups at variable 
frequencies. These results suggest substantial admixture 
among plain tribe groups in Taiwan, not among mountain 
tribes. Fast genetic drift in TwMtA, due to small popula- 
tion size and isolation is a likely explanation for these ob- 
servations, and a founder event linked to the initial 
settlement of the island is also plausible. On another hand, 
larger population sizes in TwPlt, possibly related to gene 
flow and admixture events, would have resulted in the 
higher levels of diversity in TwPlt observed nowadays. Al- 
ternatively, the possibility exists that our village-focused 
sampling method for TwMtA might have biased our 
results towards underestimation of the actual diversity 
present in these groups. Similarly, we cannot exclude that 
very recent migration to main urban centers from where 
most of our sampling comes from must have contributed 
toward the greater diversity observed in the Philippines 
and western Indonesia. However, to our credit, the 
scattered location of TwMtA groups observed in the MDS 
plot (Figure 5 and Additional file 1: Figure 3), which 
strongly suggests fast genetic drift, matches similar 
patterns reported elsewhere for Taiwan and also for 
Melanesia [67,71,72]. Furthermore, a similar scattered 
pattern was also observed with polymorphisms in HLA 
loci [73], see Figure 3 in it), as well as with a classical 
marker [74], thus further supporting the idea that the 
genetic history of TwMtA was characterized by drift 
occurring in small isolated groups. 

In a recent study of the diversity of Taiwan mtDNA 
complete genomes, Ko et al. [75] observed a decreasing 
pattern of diversity in TwMtA populations from north 
to south, although the significance of the decrease was not 
reported. A similar tendency for diversity to decrease 
towards southern Taiwan was also found for the classical 
GM genetic polymorphism but did not reach statistical 
significance [74]. Interestingly, for the non-recombining 
Y chromosome, we find here an opposite pattern in that 
gene diversity increases from north to south in a significant 
fashion (Additional file 1: Figure S3, first plot, P = 0.0142). 
This increase is concomitant with a tendency for decreasing 
frequency of haplogroup Olal* - P203 and increasing 
Y-STR diversity in this haplogroup, but both these pat- 
terns are statistically not significant (Additional file 1: 



Figure S3). Although the contrasting results between 
mtDNA and autosomal markers, on one hand, and the Y 
chromosome on the other are for the time being inconclu- 
sive, they nevertheless could hint to a differentiated demo- 
graphic history of men and women in TwMtA. 

Haplogroup Olal*-P203 was seen at high frequencies, 
ranging from 40% to 60% in all southern and eastern 
TwMtA (Paiwan, Puyuma, Paiwan, Rukai, Amis and Yami), 
and was above 87% in all northern and central mountain 
tribes (Atayal, Taroko, Saisiyat, Thao and Tsou). Conversely, 
Olal*-P203 rarely exceeds 20% in the Philippines and 
Indonesia (Figures 2 and 3, and Additional file 1: Table S2 
and Additional file 1: Figure S2), but it has been reported 
as common in south and east China where it most likely 
originated [42]. We note also that the para-haplogroup 
Olala-MlOl was not observed in our dataset covering 
ISEA, and thus extend in this way knowledge from previ- 
ous reports indicating that it must be rare in most regions 
of continental East Asia [30,42,62,65]. 

With respect to Olal*-P203, it is possible that this 
haplogroup reached Indonesia through gene flow from 
the mainland, via the Indochinese peninsula. We note 
however that Olal*-P203 is uncommon in Indochina. 
Moreover, the presence of Olal*-P203 among the Korean 
[76], the Han from Fujian and TwHan (Additional file 1: 
Table S2) rather argue for a common origin in an ancient 
dispersal from the mainland (Figure 3). 

A second, major O-clade haplogroup observed in Taiwan 
is Ola2-M50, where it is frequent in several Aborigine 
populations, especially so among southern TwMtA, but 
rather rare in the Han populations of the island. As this 
haplogroup has been reported as frequent (from 3% to 
25%) among some Daic-speaking groups from southern 
China and Hainan, it was suggested that its expansion 
throughout ISEA followed an OOT model of migration 
reaching sequentially, Taiwan, the Philippines and Indonesia 
[26,65]. Our results support this hypothesis by demon- 
strating a decrease in frequency and molecular variation of 
Ola2-M50 from Taiwan to the Philippines and Indonesia 
(Figure 4, Table 4 and Additional file 1: Table S2). The pat- 
tern of STR variation is compatible with a late Paleolithic 
origin of this haplogroup in SEA with migration and ex- 
pansion in Taiwan (-17 Kya), followed by pre-Holocene 
passages to the Philippines (-12 Kya) and Indonesia (~7 
Kya) (Figure 3). 

A remarkable result emerging from our dataset is the 
high frequency of the 02 clade, more specifically of 
haplogroup 02ala-M88, found in the Bunun (37.5%, 
Additional file 1: Table S2). Except for the presence of 
02ala-M88, as well as 02al*-M95 in the Yami, this is the 
only detected occurrence of the 02-clade in a TwMtA 
group. Of course, the absence of 02 in a sample does not 
exclude its presence in the population but it does suggest 
that it is likely less frequent. We observe that the Bunun 
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share their most common Y-STR haplotype of 02ala-M88 
with the Daic- speaking groups and with the Indochinese 
and Indonesian populations, while they share none of the 
haplotypes found among Solomon islanders, TwPlt and 
Han (Figure 4). Actually, the Bunun STR haplotypes do 
not belong to the same branch as the Solomon Islands 
or Madagascar [7,66,67]. To our opinion, these results, 
together with the observed scarceness of the 02 clade 
among TwMtA and the Philippines, preclude the TwMtA 
as a plausible contender for the paternal origin of 02 
haplogroups in Madagascar and the Solomon Islands, 
although the possibility that the 02 clade was lost by 
drift in most Austronesians from Taiwan and in the 
Philippines remains. 

Haplogroup 03 constitutes the molecularly most di- 
versified clade observed in continental East Asia [7,8, 
26-31,40,68]. The introduction of new Y-SNP markers 
for better assignment of 03 subtypes allowed us to 
demonstrate several characteristics that were not seen 
before [8,43,44]. Haplogroup O3alc*-IMS-JST002611 
has been reported in Japan, Korea, Tibet, south China and 
Indochina [62,65,76,77]. Its high prevalence observed in 
this study in Fujian Han and Taiwan Minnan and Hakka 
(14% to 26%), but very low occurrence among TwMtA 
(Yami, 3%) and in the Philippines (0.8%), likely represents 
a signature of Han-mediated gene flow. Indeed, the lack 
of a clear clinal pattern of frequency variation of this 
haplogroup between regions, the low frequency but high 
diversity observed in Indonesia (Figure 2 and Table 2) 
and the rather faint star-like shape of the MJ network 
showing numerous long branches and low frequency 
nodes (Figure 4) all suggest a recent spread of 03alc* 
from mainland SEA. Thus the 03alc* high frequency 
observed for the Ivatan (25%) of the Batan archipelago, 
north of the Philippines, associated to a low age estimate 
(0.9 Kya, Table 2 and Additional file 1: Table S3), is con- 
sistent with a recent introduction of this haplogroup by 
gene flow with TwHan or TwPlt. The presence of Han lin- 
eages at most nodes of the star-like MJ network supports 
this hypothesis (Figure 4). 

Both the frequency distribution of haplogroup 03a2b*- 
M7 (which is present at low frequencies in Taiwan, among 
the Han and TwPlt) and its MJ network are consistent 
with repeated introductions of this haplogroup by gene flow 
from mainland Southeast Asia into the islands (Figure 4). 
As already stated, we observe this haplogroup at frequen- 
cies of 7% or higher in mainland SEA. Previous studies also 
reported the presence of this haplogroup in SEA, in Yun- 
nan, Fujian, and among the She and Yao ethnic groups 
[30,40]. On another hand, haplogroup 03a2c*-P164, whose 
origins also likely trace back to mainland SEA, is quite fre- 
quent in Taiwan (especially so among the Amis), whereas 
the derived 03a2cl*-M134 and 03a2cla-M133 are rather 
rare among TwMtA. In our dataset we observed that they 



often occurred together (Additional file 1: Table S2), 
and given the recent dates estimated for their STR di- 
versification (~8 to 6 ± 4 Kya ago) (Figure 3, Table 2 
and Additional file 1: Table S2), it is likely that their spread 
southwards to Vietnam, Laos and Thailand, as well as to 
Taiwan, the Philippines and Indonesia, concomitant with 
a northward spread to Japan [26,77], occurred during 
Neolithic times. 

The pincer model and the four stages of migration 

It has been proposed that, after the initial colonization 
approximately 50 Kya of SEA and Indochina by modern 
humans bringing along haplogroups C, M and S, the ori- 
gin and spread of most haplogroups seen today in east 
Asia, Taiwan and ISEA could be retraced according to 
four stages (A to D) of a paternal demographic model 
[26]. Stages B, C and D correspond to gene dispersals tak- 
ing their origin in mainland Southeast Asia or Indochina 
and whose directions of flow form a pincer model, with a 
northern branch spreading through Taiwan and a south- 
ern branch through Indonesia. In this pincer model, the 
Philippines appear as a confluent region of genes acquired 
separately from Taiwan or Indonesia and more recently 
from the Asian mainland. The patterns of genetic diversity 
observed in this study are consistent with this scenario in 
that several haplogroups displaying a cline with lower 
Y-STR diversity over Western Indonesia (i.e. 01a*-M119, 
Olal*-P203 and Ola2-M50) or lower diversity over 
Taiwan (02al*-M95) show diversity higher than expected 
in the Philippines, as attested by the corresponding age 
estimates (Table 2 and Additional file 1: Table S3). 

The northern branch postulated by the pincer model 
would have first reached Taiwan, thus introducing in the 
island haplogroups 01a*-M119, Olal*-P203 and 01a2- 
M50. Further, these haplogroups also display a decreasing 
cline in frequency from Taiwan towards the Philippines 
and Western Indonesia (Figure 3). 

The MJ networks associated with the Ol clade are the 
only ones in which Y-STR haplotypes of TwMtA are ob- 
served among the fO founder nodes (Figure 4). Along with 
the very low frequency of non-Ol clades among TwMtA, 
these results suggest that haplogroups Olal and 01a2 rep- 
resent the earliest traces of the Austronesian-agriculturist 
dispersal to Taiwan. Furthermore, TwMtA haplotypes of 
other haplogroups (02-P31/02a-Pk4/02al-M95; 02ala- 
M88; 03a2b-M7 and 03a2c-P164) were rarely seen among 
the fl founder nodes, consistent with the hypothesis that 
later direct east-west sea passages occurred between main- 
land East Asia and Taiwan. Thus, the Y-SNP haplogroups 
frequency distributions and their STR diversity observed 
today in TwMtA populations give support to the northern 
branch of the pincer model (the Taiwanese branch). This 
Austronesian-agriculturist dispersal most likely expanded 
principally within the boundaries of present-day Taiwan 
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and the Philippines before reaching Western Indonesia 
which was already populated by indigenous hunter- 
gatherers, possibly of Asiatic origin [64]. 

Populations that first went south from Southeast Asia 
along or from the Indochinese peninsula, Malaysia, 
western Indonesia (Sumatra, Java, and Borneo) and the 
Philippines represent the southern branch of the pincer 
model This dispersal would include haplogroups Olal*- 
P203, 02al-M95/M88, 03a*-M324, 03ala-M121, 03alc*- 
IMS-JST002611, O3a2*-P201 , 03a2a-M159, 03a2b*-M7, 
03a2c*-P164 and 03a2cla-M133 [14,26,27,67,78]. For 
most of these haplogroups, it is currently held that they 
first expanded and diversified within the boundaries of 
present-day southeast China, Indochina and Indonesia, 
and they are considered as involved in a Paleolithic contri- 
bution from mainland Asia by Karafet et al. [26]. 

The tips of the two branches of the pincer model 
may have, at times, reached and crossed over in the 
Philippines. Such bidirectional dispersals have been previ- 
ously proposed on the basis of specific patterns of mtDNA 
diversity, namely the northward spread of mtDNA hap- 
logroup E from the Philippines to Taiwan through gene 
flow [17] and the southward frequency gradient of 
mtDNA haplogroup B4ala [18,19,79]. This scenario is 
consistent with, although not statistically reinforced by 
the unexpected increase in diversity of haplogroups 
Olal-P203 and 03a2c-M164 observed in the Philippines 
(Figure 3, Table 2 and Additional file 1: Table S3). 

The pincer model described here applies mostly to 
western ISEA. The two branches rejoin further east 
through ISEA where significant eastward decreasing 
frequency clines of most haplogroups of the O clade 
have been observed, although with resurgence of some 
derived haplogroups of the 03 clade in the Bismark 
archipelago and Polynesia [80,81]. Most likely, part of 
this scenario may be ascribed to the "out of Taiwan" 
model [26]. 

Our results are in agreement with the multidirectional 
gene flow out of SEA described by Li et al. for haplogroup 
"Ola*', which was further resolved here as Olal*-P203 
[26,27]. Although the genetic relationships of the 
Philippines with Taiwan and Western Indonesia are well 
supported in this study on the basis of Y-SNP variation, 
the fast mutation rate of Y-STRs, together with likely long 
periods of isolation of the groups, contributed to more 
distinct structuring between regions and low present-day 
haplotype sharing (Figure 4 and Tables 2 and 4) [32-34]. 
Indeed, the significant genetic differentiation observed be- 
tween regions of the mainland and western insular SEA 
evidenced by AMOVA analyses would not be expected if 
substantial gene flow between these regions would have 
been maintained after the initial dispersals held by the 
pincer model (Table 4). Furthermore, the rare instances of 
sharing of Y-STR haplotypes observed between distant 



regions, bypassing regions in between (i.e. the occurrence 
of nodes shared uniquely between Indonesia and TwMtA 
in the Olal-P203 MJ network, Figure 4), may be the 
product of later admixture through significant maritime 
trade activities [82,83], rather than admixture resulting 
from land driven migration of peoples where gene con- 
tinuity in the MJ networks should have been evidenced 
(Figure 4). 

Correspondences with the four stages of dispersal model 

Consistent with the first stage (A) of Karafet et al. [26], 
we found that ancient traces of the initial settlement of 
modern humans in East Asia -50 Kya ago were retained 
in the Philippines, through the occurrence of haplogroups 
F* and K* in this archipelago. While possibly coming from 
SEA, the higher frequency of haplogroup K in the 
Philippines than in Indochina and Southeast Asia [11,36] 
suggests that the presence of two individuals in Taiwan 
bearing this haplogroup could result from gene flow 
between the Philippines and Taiwan [18]. 

More generally, the presence of haplogroups C, K* or 
F* in Western Indonesia may be associated with the 
Paleolithic colonization of ISEA, whereas that of hap- 
logroups R and H is likely linked to more recent migrations 
from Arabs, Indians (or Eurasians) and need to be deter- 
mined further. Lastly, haplogroups D, N, P* and Q-M242 
may be associated with isolated migrations from central 
Asia [8,14,50,84]. Since the Akha made their way from 
China into South East Asia during the early 1900s the 
prominence of Q-M242 in the Akha (56%) is intriguing 
and more specific determination of this haplogroup 
should reveal if a central Asian origin and spread into 
SEA is supported or whether Q* was introduced from the 
south, coming from the Indian subcontinent [84,85]. 

We confirm stage B of Karafet et al. [26] coalescence 
results indicating that dispersal within SEA may have 
initiated in late the Paleolithic since the age estimates of 
the diversity of haplogroups Ola*, Olal*, 02al* and 
03* inferred here for the whole region are of 18 Kya to 
14 Kya (Figure 3). 

Stage C describes later migrations/gene flow, between 
approximately 8 to 6 Kya ago, corresponding to the early 
Neolithic Austronesian expansion and that brought not 
only Olal*-P203, Ola2-M50/M110 and O3a2*-P208 [26] 
but also 02al*-M95 (nowadays observed in the TwPlt), 
02ala-M88 (observed in the Bunun), and 03a2c*-P164 to 
Taiwan, the Philippines and near Oceania, the latter most 
likely in association with the Out of Taiwan dispersal 4 
Kya ago (Figure 3). 

In addition to 03a2b*-M7 in stage D of Karafet et al. 
[26], we propose to consider also 03a*-M122 as a late 
introduction in Taiwan from the mainland, as well as 
haplogroups 03alc-2611 and 03a2cl-M134 which could 
have reached Taiwan and Western Indonesia separately, 
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through the northern and southern dispersal branches of 
the pincer model, respectively. 

Genetic relationships with Madagascar and the 
Solomon islands 

Madagascar and the Solomon islands may share a common 
paternal molecular past. While haplogroup Ola2-M50 is 
commonly seen in Taiwan and moderately distributed 
throughout western ISEA, 02al-M95 is rare all over the 
Philippines [7], common in Western Indonesia [7] and 
even more so in Indochina. Interestingly, haplogroups 
Ola2-M50 and 02al-M95/M88 were seldom observed 
together in the same population in our dataset or in pub- 
lished data [7]. Their co-occurrence in four well separated 
groups, Madagascar, the Solomon Islands, Bunun and 
Western Indonesia is intriguing and most likely implies a 
complex relationship between a series of male migration 
events from Western Indonesia [66,68,81,86]. We note 
that the core node (fO) of the 02al-M95/M88 MJ network 
(Figure 4) comprises haplotypes presently shared between 
Indochinese, western Indonesians, Han, Daic groups on 
the mainland (and Hainan) [27,36] and a few TwPlt, but 
not the Bunun whose founder haplotype belongs to an fl 
isolated node containing uniquely 02ala-M88 chromo- 
somes. It is possible that the missing 02ala-M88 among 
all TwMtA in Taiwan except for the Bunun is due to an 
isolated migration event from mainland Southeast Asia 
that did not affect already isolated TwMtA tribes. By 
contrast, the presence of 02 haplogroups in the Yami is 
most likely due to later gene flow between Taiwan and 
Tao Island (Orchid island) [25]. 

We also found that all nodes leading to the Solomon 
Islands in the 02al-M95/M88 MJ network include Y-STR 
haplotypes also observed in Madagascar (Figure 4). 
In other words, the MJ networks of Ola2-M50 and 
02al-M95/M88 provide clear evidence of a genetic 
footprint of common ancestry between Malagasy and 
Solomon islanders. It is a generally accepted view that 
Ola2-M50 traces its origin back to Taiwan [63] which 
leads to the assumption that 02al-M95/M88 followed the 
same route. But considering the scarcity of 02al-M95/ 
M88 in the Philippines, it is likely that haplogroups 
Ola2-M50 and 02al-M95/M88 reached Indonesia through 
separate dispersals, Ola2-M50 through the northern 
branch of the pincer model and 02al-M95/M88 through 
its southern branch, and that these branches coupled in 
Western Indonesia, possibly in Borneo or Sumatra. Later 
migrations drove this haplogroup, as well as 03a2c-P164, 
eastwards to the Solomon Islands, and westwards to 
Madagascar. These migrations could have also taken along 
mtDNA haplogroups F3b and M7c3c [14,18,19,66], and 
may have happened at the time when the Malagasy motif 
of mtDNA haplogroup B4alala (with mutations at np. 
1473 and 3423) appeared [86]. 



The Taiwan plain tribes 

Lastly, we found that the Taiwan plain tribes share more 
Y-STR haplotypes with Taiwanese Han (12%) than with 
TwMtA (3%) (Table 3, LS relative frequency). On the 
other hand, 77% of Y-STR haplotypes in TwPlt were not 
shared with any other group. Dating estimates were ob- 
tained from these unshared haplotypes in the background 
of their corresponding haplogroups (Olal-P203, 01a2- 
M50 and 03a2c-P164, whose frequencies are of 16%, 6% 
and 18%, respectively) using MJ sub-network clusters that 
show continuity between lineages. Values ranging from 
2,616 ± 1.806 Kya to 6.458 ± 2.951 Kya (data not shown) 
were obtained. Although the standard deviation associated 
with these assignments is large, these results raise the 
possibility that, rather than having evolved in the TwPlt 
as a result of long isolation, a significant amount of 
unique Y-STR diversity may have been acquired from 
different waves of settlers than those whose descendants 
are observed today in urban centers. 

Conclusion 

Our fine-scale study of Y-chromosome polymorphisms 
in Southeast Asian populations supports the view that the 
male genetic diversity of present-day populations living 
in Taiwan, the Philippines and Western Indonesia was 
formed through a complex pattern of settlement and 
dispersals, here coined the pincer model, that perfectly 
matches expectations derived from previously described 
models [26]. This pincer model includes two dispersal 
routes to ISEA, namely a northern route, from the 
mainland through Taiwan and a southern route through 
western Indonesia. Age estimates of the diversity of the 
major Y-chromosome haplogroups observed in SEA all 
fall approximately within the last 20 Kya. This puts an 
upper bound to the starting time of dispersals in the 
pincer model that is in agreement with previous studies 
that concluded to a settlement history of ISEA starting in 
the late upper Paleolithic and continuing during the early 
Neolithic, notably with the expansion of Austronesian 
peoples [14,18,26,27,87]. It is during this last period that 
gene flow between specific islands likely intensified, 
notably in the Philippines, where high genetic diversity 
of several Y-SNP haplogroups seems to be due to inde- 
pendent gene flow with either Taiwan or Indonesia. A 
further likely contribution to the high Y-chromosome 
diversity in the Philippines is gene flow due to sailing 
traders from the southeast coast of China or Eastern 
Indochina (Vietnam) [82]. However, genetic drift due to 
the late upper Paleolithic to early Neolithic isolation of 
populations settling in the islands may have substantially 
contributed to the significant differentiation of male 
Y-SNP and Y-STR pools between SEA regions. On another 
hand, it was found that Madagascar and the Solomon 
Islands might share part of their paternal ancestry in 
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Western Indonesia (or the southern Indochinese peninsula), 
in a population having acquired haplogroups Ola2-M50 
from Taiwan and 02al-M95/M88 from Indochina. Lastly, 
although the paternal genetic pool of the Taiwan plain 
tribes closely resembles that of Han populations from 
Taiwan and Fujian, the high diversity of non-shared haplo- 
types found in the plain tribes suggests a settlement in 
Taiwan that pre-dates the arrival of Han in the island by 
some 2 to 6 Kya. 

In the future, complete human genome data analysis 
and more pertinent and larger sampling of populations 
should help in obtaining a better assessment of drift, 
admixture and the timing of migrations events between 
continental East Asia, Taiwan, and other regions associ- 
ated with the wide dispersal of the Austronesian societies. 

Additional file 
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